75 research outputs found

    FAIR Data Model for Chemical Substances: Development Challenges, Management Strategies, and Applications

    Get PDF
    Data models for representation of chemicals are at the core of cheminformatics processing workflows. The standard triple, (structure, properties, and descriptors), traditionally formalizes a molecule and has been the dominant paradigm for several decades. While this approach is useful and widely adopted from academia, the regulatory bodies and industry have complex use cases and impose the concept of chemical substances applied for multicomponent, advanced, and nanomaterials. Chemical substance data model is an extension of the molecule representation and takes into account the practical aspects of chemical data management, emerging research challenges and discussions within academia, industry, and regulators. The substance paradigm must handle a composition of multiple components. Mandatory metadata is packed together with the experimental and theoretical data. Data model elucidation poses challenges regarding metadata, ontology utilization, and adoption of FAIR principles. We illustrate the adoption of these good practices by means of the Ambit/eNanoMapper data model, which is applied for chemical substances originating from ECHA REACH dossiers and for largest nanosafety database in Europe. The Ambit/eNanoMapper model allows development of tools for data curation, FAIRification of large collections of nanosafety data, ontology annotation, data conversion to standards such as JSON, RDF, and HDF5, and emerging linear notations for chemical substances

    The Benigni / Bossa Rulebase for Mutagenicity and Carcinogenicity - A Module of Toxtree

    Get PDF
    The Joint Resarch Centre's European Chemicals Bureau has developed a hazard estimation software called Toxtree, capable of making structure-based predictions for a number of toxicological endpoints. One of the modules developed as an extension to Toxtree is aimed at the prediction of carcinogenicity and mutagenicity. This module encodes the Benigni/Bossa rulebase for carcinogenicity and mutagenicity developed by Romualdo Benigni and Cecilia Bossa at the Istituto Superiore di Sanita¿, in Rome, Italy. The module was coded by the Toxtree programmer, Ideaconsult Ltd, Bulgaria. In the Toxtree implementation of this rulebase, the processing of a query chemical gives rise to limited number of different outcomes, namely: a) no structural alerts for carcinogenicity are recognised; b) one or more structural alerts (SAs) are recognised for genotoxic or non-genotoxic carcinogenicity; c) SAs relative to aromatic amines or aß-unsaturated aldehydes are recognised, and the chemical goes through Quantitative Structure-Activity Relationship (QSAR) analysis, which may result in a negative or positive outcome. If the query chemical belongs to the classes of aromatic amines or aß-unsaturated aldehydes, the appropriate QSAR is applied and provides a more refined assessment than the SAs, and should be given higher importance in a weight-of-evidence scheme. This report gives an introduction to currently available QSARs and SAs for carcinogenicity and mutagenicity, and provides details of the Benigni/Bossa rulebase.JRC.I.3-Consumer products safety and qualit

    OpenTox predictive toxicology framework: toxicological ontology and semantic media wiki-based OpenToxipedia

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The OpenTox Framework, developed by the partners in the OpenTox project (<url>http://www.opentox.org</url>), aims at providing a unified access to toxicity data, predictive models and validation procedures. Interoperability of resources is achieved using a common information model, based on the OpenTox ontologies, describing predictive algorithms, models and toxicity data. As toxicological data may come from different, heterogeneous sources, a deployed ontology, unifying the terminology and the resources, is critical for the rational and reliable organization of the data, and its automatic processing.</p> <p>Results</p> <p>The following related ontologies have been developed for OpenTox: a) Toxicological ontology – listing the toxicological endpoints; b) Organs system and Effects ontology – addressing organs, targets/examinations and effects observed in <it>in vivo</it> studies; c) ToxML ontology – representing semi-automatic conversion of the ToxML schema; d) OpenTox ontology– representation of OpenTox framework components: chemical compounds, datasets, types of algorithms, models and validation web services; e) ToxLink–ToxCast assays ontology and f) OpenToxipedia community knowledge resource on toxicology terminology.</p> <p>OpenTox components are made available through standardized REST web services, where every compound, data set, and predictive method has a unique resolvable address (URI), used to retrieve its Resource Description Framework (RDF) representation, or to initiate the associated calculations and generate new RDF-based resources.</p> <p>The services support the integration of toxicity and chemical data from various sources, the generation and validation of computer models for toxic effects, seamless integration of new algorithms and scientifically sound validation routines and provide a flexible framework, which allows building arbitrary number of applications, tailored to solving different problems by end users (e.g. toxicologists).</p> <p>Availability</p> <p>The OpenTox toxicological ontology projects may be accessed via the OpenTox ontology development page <url>http://www.opentox.org/dev/ontology</url>; the OpenTox ontology is available as OWL at <url>http://opentox.org/api/1 1/opentox.owl</url>, the ToxML - OWL conversion utility is an open source resource available at <url>http://ambit.svn.sourceforge.net/viewvc/ambit/branches/toxml-utils/</url></p

    AMBIT RESTful web services: an implementation of the OpenTox application programming interface

    Get PDF
    The AMBIT web services package is one of the several existing independent implementations of the OpenTox Application Programming Interface and is built according to the principles of the Representational State Transfer (REST) architecture. The Open Source Predictive Toxicology Framework, developed by the partners in the EC FP7 OpenTox project, aims at providing a unified access to toxicity data and predictive models, as well as validation procedures. This is achieved by i) an information model, based on a common OWL-DL ontology ii) links to related ontologies; iii) data and algorithms, available through a standardized REST web services interface, where every compound, data set or predictive method has a unique web address, used to retrieve its Resource Description Framework (RDF) representation, or initiate the associated calculations

    Industry-scale application and evaluation of deep learning for drug target prediction

    Get PDF
    Artificial intelligence (AI) is undergoing a revolution thanks to the breakthroughs of machine learning algorithms in computer vision, speech recognition, natural language processing and generative modelling. Recent works on publicly available pharmaceutical data showed that AI methods are highly promising for Drug Target prediction. However, the quality of public data might be different than that of industry data due to different labs reporting measurements, different measurement techniques, fewer samples and less diverse and specialized assays. As part of a European funded project (ExCAPE), that brought together expertise from pharmaceutical industry, machine learning, and high-performance computing, we investigated how well machine learning models obtained from public data can be transferred to internal pharmaceutical industry data. Our results show that machine learning models trained on public data can indeed maintain their predictive power to a large degree when applied to industry data. Moreover, we observed that deep learning derived machine learning models outperformed comparable models, which were trained by other machine learning algorithms, when applied to internal pharmaceutical company datasets. To our knowledge, this is the first large-scale study evaluating the potential of machine learning and especially deep learning directly at the level of industry-scale settings and moreover investigating the transferability of publicly learned target prediction models towards industrial bioactivity prediction pipelines.Web of Science121art. no. 2

    The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

    Get PDF
    open access articleBackground: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms. Results: We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism. Conclusions: This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software

    The eNanoMapper database for nanomaterial safety information

    Get PDF
    Background: The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. Results: The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. Conclusion: We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the “representational state transfer” (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure–activity relationships for nanomaterials (NanoQSAR)

    Chemical Similarity and Threshold of Toxicological Concern (TTC) Approaches: Report of an ECB Workshop held in Ispra, November 2005

    Get PDF
    There are many national, regional and international programmes – either regulatory or voluntary – to assess the hazards or risks of chemical substances to humans and the environment. The first step in making a hazard assessment of a chemical is to ensure that there is adequate information on each of the endpoints. If adequate information is not available then additional data is needed to complete the dataset for this substance. For reasons of resources and animal welfare, it is important to limit the number of tests that have to be conducted, where this is scientifically justifiable. One approach is to consider closely related chemicals as a group, or chemical category, rather than as individual chemicals. In a category approach, data for chemicals and endpoints that have been already tested are used to estimate the hazard for untested chemicals and endpoints. Categories of chemicals are selected on the basis of similarities in biological activity which is associated with a common underlying mechanism of action. A homologous series of chemicals exhibiting a coherent trend in biological activity can be rationalised on the basis of a constant change in structure. This type of grouping is relatively straightforward. The challenge lies in identifying the relevant chemical structural and physicochemical characteristics that enable more sophisticated groupings to be made on the basis of similarity in biological activity and hence purported mechanism of action. Linking two chemicals together and rationalising their similarity with reference to one or more endpoints has been very much carried out on an ad hoc basis. Even with larger groups, the process and approach is ad hoc and based on expert judgement. There still appears to be very little guidance about the tools and approaches for grouping chemicals systematically. In November 2005, the ECB Workshop on Chemical Similarity and Thresholds of Toxicological Concern (TTC) Approaches was convened to identify the available approaches that currently exist to encode similarity and how these can be used to facilitate the grouping of chemicals. This report aims to capture the main themes that were discussed. In particular, it outlines a number of different approaches that can facilitate the formation of chemical groupings in terms of the context under consideration and the likely information that would be required. Grouping methods were divided into one of four classes – knowledge-based, analogue-based, unsupervised, and supervised. A flowchart was constructed to attempt to capture a possible work flow to highlight where and how these approaches might be best applied.JRC.I.3-Toxicology and chemical substance

    Representing and describing nanomaterials in predictive nanoinformatics

    Get PDF
    This Review discusses how a comprehensive system for defining nanomaterial descriptors can enable a safe-and-sustainable-by-design concept for engineered nanomaterials. Engineered nanomaterials (ENMs) enable new and enhanced products and devices in which matter can be controlled at a near-atomic scale (in the range of 1 to 100 nm). However, the unique nanoscale properties that make ENMs attractive may result in as yet poorly known risks to human health and the environment. Thus, new ENMs should be designed in line with the idea of safe-and-sustainable-by-design (SSbD). The biological activity of ENMs is closely related to their physicochemical characteristics, changes in these characteristics may therefore cause changes in the ENMs activity. In this sense, a set of physicochemical characteristics (for example, chemical composition, crystal structure, size, shape, surface structure) creates a unique 'representation' of a given ENM. The usability of these characteristics or nanomaterial descriptors (nanodescriptors) in nanoinformatics methods such as quantitative structure-activity/property relationship (QSAR/QSPR) models, provides exciting opportunities to optimize ENMs at the design stage by improving their functionality and minimizing unforeseen health/environmental hazards. A computational screening of possible versions of novel ENMs would return optimal nanostructures and manage ('design out') hazardous features at the earliest possible manufacturing step. Safe adoption of ENMs on a vast scale will depend on the successful integration of the entire bulk of nanodescriptors extracted experimentally with data from theoretical and computational models. This Review discusses directions for developing appropriate nanomaterial representations and related nanodescriptors to enhance the reliability of computational modelling utilized in designing safer and more sustainable ENMs.Peer reviewe
    corecore